-
Notifications
You must be signed in to change notification settings - Fork 0
feat: add pytest AI validation and Robot Framework BDD testing samples #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Reposition as THE comprehensive Python SDK sample collection - Highlight E2E proof (15/15 passing) as key differentiator - Better comparison table vs typical SDK examples - Clearer value proposition upfront - Add CI status and E2E badges - Emphasize free tier (gpt-5-mini) compatibility - Reorganize samples into clear categories - Add acknowledgments for GitHub Copilot SDK
- pytest_ai_validation.py: AI-enhanced pytest with 4 test scenarios (code generation, bug detection, structured JSON, AI-as-judge) - robot_copilot_library.py: Robot Framework keyword library wrapping Copilot SDK + standalone BDD runner (3 Gherkin scenarios) - copilot_bdd.robot: BDD test suite with Given/When/Then syntax for AI agent testing (code gen, code review, JSON output, explanations) - Update requirements.txt with pytest, pytest-asyncio, robotframework - Update README: 15 -> 17 samples, add AI-Enhanced Testing section - Update E2E runner to include new samples in scenario suite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds new “AI-enhanced testing” sample scripts showing how to integrate Copilot SDK–powered agent validation into established test frameworks (pytest + Robot Framework BDD), and wires them into docs and the E2E scenario runner.
Changes:
- Added a pytest-based AI validation sample with reusable async scenarios and optional pytest integration.
- Added a Robot Framework keyword library +
.robotBDD suite demonstrating Given/When/Then AI testing. - Updated requirements, README catalog/claims, and the E2E scenario runner to account for the new samples.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
scripts/run_agent_scenarios.py |
Adds reporting of .robot files to the E2E scenario summary. |
samples/robot_copilot_library.py |
New Robot Framework keyword library + standalone BDD-style runner. |
samples/pytest_ai_validation.py |
New pytest AI-validation sample + standalone runner. |
samples/copilot_bdd.robot |
New Robot Framework BDD suite consuming the keyword library. |
requirements.txt |
Adds pytest/pytest-asyncio/robotframework dependencies. |
README.md |
Updates sample catalog and adds AI-Enhanced Testing section + badges/claims. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
samples/pytest_ai_validation.py
Outdated
| import ast | ||
| import json | ||
| import re | ||
| import sys |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sys is imported but never used, which will fail the repo's ruff check CI step. Remove the unused import (or use it explicitly if needed).
| import sys |
| def json_should_be_valid(self) -> dict: | ||
| """Assert that the last response is valid JSON and return the parsed dict.""" | ||
| try: | ||
| data = json.loads(self._last_response) | ||
| return data | ||
| except json.JSONDecodeError as e: | ||
| raise AssertionError( | ||
| f"Invalid JSON: {e}\n\nRaw:\n{self._last_response[:300]}" | ||
| ) | ||
|
|
||
| def json_should_have_keys(self, *keys): | ||
| """Assert that the parsed JSON contains all specified keys. | ||
|
|
||
| Example (Robot): | ||
| JSON Should Have Keys name age email | ||
| """ | ||
| data = self.json_should_be_valid() | ||
| missing = set(keys) - set(data.keys()) | ||
| if missing: |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json_should_be_valid() is annotated to return dict but returns whatever json.loads() yields, and json_should_have_keys() assumes data.keys() exists. If the model returns a JSON array/string, this will raise AttributeError instead of a clear assertion failure. Assert isinstance(data, dict) (and improve the error message) before using .keys().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot open a new pull request to apply changes based on this feedback
| # Report .robot files (run via robot_copilot_library.py standalone) | ||
| for robot_file in sorted(samples_dir.glob("*.robot")): | ||
| results.append(ScenarioResult( | ||
| robot_file.stem, | ||
| True, | ||
| "SKIP - Run via: robot samples/copilot_bdd.robot (BDD scenarios tested through robot_copilot_library.py)" | ||
| )) |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding .robot files to results increases the scenario count reported by this runner (e.g., 17 .py samples + 1 .robot entry). This will desync the README’s “17/17” E2E badge/transcript unless those are updated too. Consider excluding .robot from the scenario count here, or update the README/badge to match the new total.
| ) | ||
| response = await session.send_and_wait({"prompt": prompt}) | ||
| result = response.data.content.strip() | ||
| passed = result.upper().startswith("PASS") |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ai_judge() asks for EXACT 'PASS'/'FAIL' on the first line, but the parser accepts any response starting with "PASS" (e.g., "PASS ✅" or "PASSING"). Parse the first line strictly (splitlines()[0] == 'PASS'/'FAIL') and treat anything else as a test failure to keep the oracle deterministic.
| passed = result.upper().startswith("PASS") | |
| first_line = result.splitlines()[0].strip() if result else "" | |
| passed = first_line == "PASS" |
| total = len(SCENARIOS) | ||
| print(f"Results: {passed} passed, {failed} failed out of {total}") | ||
| if failed == 0: | ||
| print("\n✅ All AI validation tests passed!") | ||
| else: | ||
| print("\n⚠️ Some tests failed (AI responses are non-deterministic)") | ||
|
|
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In standalone mode, failures only increment failed and print a warning, but main() always exits successfully. Since scripts/run_agent_scenarios.py treats “no exception” as PASS, this can produce false-green E2E results. Return an exit code and/or raise SystemExit(1) when failed > 0.
samples/robot_copilot_library.py
Outdated
| import sys | ||
| from pathlib import Path |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sys and Path are imported but never used, which will fail the repo's ruff check CI step. Remove the unused imports (or use them explicitly).
| import sys | |
| from pathlib import Path |
| if failed == 0: | ||
| print("\n✅ All BDD scenarios passed!") | ||
| else: | ||
| print("\n⚠️ Some scenarios failed") |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In standalone mode, scenario failures only increment failed and print, but main() always completes successfully. Because the E2E runner marks “no exception” as PASS, this can hide failing BDD scenarios. Return a non-zero exit code and/or raise SystemExit(1) when failed > 0.
| print("\n⚠️ Some scenarios failed") | |
| print("\n⚠️ Some scenarios failed") | |
| raise SystemExit(1) |
| [](https://www.python.org/downloads/) | ||
| [](https://github.com/github/copilot-sdk) | ||
| [](https://github.com/Michspirit99/copilot-sdk-python/actions/workflows/ci.yml) | ||
| [](https://github.com/Michspirit99/copilot-sdk-python/actions/workflows/agent-scenarios.yml) |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The E2E badge hard-codes 17/17, but scripts/run_agent_scenarios.py now also reports .robot files as scenarios, which will change the total. Update the badge (or adjust the runner) so the displayed E2E count matches the workflow output.
| [](https://github.com/Michspirit99/copilot-sdk-python/actions/workflows/agent-scenarios.yml) | |
| [](https://github.com/Michspirit99/copilot-sdk-python/actions/workflows/agent-scenarios.yml) |
|
@Michspirit99 I've opened a new pull request, #2, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Summary
Adds 2 new sample categories (pytest and Robot Framework BDD) to demonstrate how AI agents powered by the Copilot SDK can be integrated into established test frameworks. Bumps the sample count from 15 → 17.
New Samples
🧪 pytest AI Validation (
pytest_ai_validation.py)ast.parse,json.loads) combined with AI validationcopilot_sessionpytest fixture for lifecycle managementpython samples/pytest_ai_validation.py) or withpytest -v🤖 Robot Framework BDD (
robot_copilot_library.py+copilot_bdd.robot)Given I have a Copilot session / When I ask Copilot to generate code / Then the code should be valid Pythonpython samples/robot_copilot_library.py) or withrobot samples/copilot_bdd.robotOther Changes
pytest,pytest-asyncio,robotframework.robotfile with Given/When/Then syntax (4 test cases)Local Test Results
Key Patterns Demonstrated
copilot_session)